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Abstract 


Multiple importance sampling (MIS) methods use a set of proposal distributions from which samples are drawn. 
Each sample is then assigned an importance weight that can be obtained according to different strategies. This work 
is motivated by the trade-off between variance reduction and computational complexity of the different approaches 
(classical vs. deterministic mixture) available for the weight calculation. A new method that achieves an efficient 
compromise between both factors is introduced in this paper. It is based on forming a partition of the set of proposal 
distributions and computing the weights accordingly. Computer simulations show the excellent performance of the 
associated partial deterministic mixture MIS estimator. 


Index Terms 

Monte Carlo methods, multiple importance sampling, deterministic mixture, adaptive importance sampling. 


I. Introduction 

Importance sampling (IS) methods approximate statistical moments of a variable of interest by sets of samples, 
drawn from a proposal distribution different from the targeted one, and weights, assigned to the samples in order to 
measure their adequacy in approximating the target |[T|, Q. In its standard form, IS uses one proposal distribution 
from which all samples are drawn. However, a more powerful strategy to reduce the variance of the estimators 
consists of using a set of different proposal distributions. This is the basis of multiple importance sampling (MIS) 
techniques Q. The samples drawn from the different proposals in MIS are then assigned weights proportional to the 
ratio between the target and the proposal densities evaluated at the sample values. Several strategies for calculating 
the weights have been considered, depending on the way in which the proposal evaluation in the denominator is 
interpreted. In the standard MIS, the evaluated proposal is exactly the one from which the sample was generated. 
This constitutes the simplest method in terms of computational complexity. A different approach, referred to as 
deterministic mixture (DM) MIS, interprets the set of generating proposals as a whole mixture distribution and 
calculates the weight of a given sample by considering the entire mixture as the global proposal |)4l. This method 
attains a variance reduction at the expense of an increase in the computational load Il5l. ll^. 

In this work, we propose a novel MIS method that provides an efficient tradeoff in terms of computational 
complexity and variance reduction of the associated IS estimators. The approach is based on creating a partition 
of the set of available proposals and considering that each partitioned set constitutes a mixture distribution. A 
sample drawn from a mixand in one of the partitions (mixtures) is then assigned a weight that only accounts for 
that particular mixture, instead of the entire composite mixture as in the full DM-MIS. A remarkable reduction 
in computational complexity is achieved by this approach, while the variance of the associated partial DM-MIS 
estimator remains comparable to that of the full DM-MIS estimator. The method can not only applied to IS methods 
with static distributions (i.e., characterized by fixed parameters), but also to IS methods that adapt the parameters 
of the proposal distribution in an iterative way (i.e., adaptive IS (AIS) methods Q, lUl, Q). Computer simulations 
show the excellent performance of the proposed scheme in terms of variance reduction for a given computational 
load. 
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II. Problem statement and background 

In many applications, the interest lies in obtaining the posterior density function (pdf) of set of unknown 
parameters given the observed data. Mathematically, denoting the vector of unknowns as x G 2? C and the 
observed data as y G M^, the pdf is defined as 

^(x|y) = 0^ 7r(x|y) = 2(y|x)g(x), (1) 

where 2(y|x) is the likelihood function, g{x.) is the prior pdf, and Z{y) is the normalization factorQThe computation 
of a particular moment of x is then given by 

^ ^ /(x)7r(x)(ix, (2) 

where /(•) can be any integrable function of x. In many practical scenarios, we cannot obtain an analytical solution 
of Q and Monte Carlo methods are used to approximate it. 


A. Importance Sampling (IS) 

Let us consider N samples (xi,... ,xjv) drawn from a proposal pdf, (?(x), with heavier tails than the target, 
7r(x). The samples have associated importance weights given by 


Wi = 


7r(xj) 

q{^i)'' 




Using the samples and weights, the moment of interest can be approximated as 

N , N 


hs = 


1 


E,=i wj 




NZ^ 

2=1 


X,; 


( 3 ) 


( 4 ) 


where Z = ^ unbiased estimator of Z = J^7r(x)(ix |[T]. Note that Eq. Q always provides a 

consistent estimator of I, but its variance is directly related to the discrepancy between 7r(x)|/(x)| and g(x) (for a 
specific choice of /) or to the mismatch between the target 7r(x) and the proposal q{-x) (for a general and arbitrary 

/) m, ca. 


B. Multiple Importance Sampling (MIS) 

Finding a good proposal pdf, g'(x), is critical and can also be very challenging |I4|. An alternative and more 
robust strategy consists of using a population of different proposal pdfs. This scheme is often known in the literature 
as multiple importance sampling (MIS) IT], JH, 0. Let us consider a set of N (normalized) proposal pdfs, 
g'i(x),... ,g'iv(x), and let us assume that exactly one sample is drawn from each of them, i.e., x* ~ qi(x), i = 
1,..., A^. The importance weights associated to these samples can then be obtained according to one of the following 
strategies: 


• Standard MIS: Wi = i = 1,... ,N. 

• Deterministic mixture MIS (DM-MIS) @|: 


7r(xi) _ 7r(xj) 


f = l,...,iV, 


( 5 ) 


where V’(x) = ^ mixture pdf, composed of all the proposal pdfs. This approach interprets 

the complete set of samples, {x*}^^, as being distributed according to the mixture V'(x), i.e., {xi,..., xjv} ~ 
?/)(x). See Appendix [A| for further details. 

In both cases, the consistency of the estimators is ensured. The main advantage of the DM-MIS weights is that 
they yield more stable and efficient estimators |j4|, i.e., with less variance (as proved in Appendix]^. However, 


'From now on, we remove the dependence on y to simplify the notation. 
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the DM-MIS estimator is computationally more expensive, as it requires N evaluations of the proposal to obtain 
each weight instead of just onej^ In some practical scenarios, this additional load may he excessive (especially for 
large values of N) and alternative efficient solutions must he developed. 


C. Adaptive Importance Sampling (AIS) 

In order to decrease the mismatch between the proposal and target pdfs, there are several Monte Carlo methods 
that iteratively adapt the parameters of the proposal pdf using the information of the past samples Q, lO, Q. 
In this scenario, we have a set of proposal pdfs = 1, • • • , J}f=i, where the superscript t indicates the 

iteration index and T is the total number of adaptation steps. Some of these well-known AIS methods, which are 
also based on MIS, are Population Monte Carlo (PMC) and its variants iQ, ifTTI . lIT^ . lIT^ . adaptive multiple 
importance sampling (AMIS) iH, and adaptive population importance sampling (APIS) IS, ||9l. 


III. Partial Deterministic Mixture approach 

In this section we develop a partial DM-MIS scheme, which groups the proposal pdfs, {qj(x)}^^, into P 
mixtures composed of M mixands, with PM = N (recall that x* ~ = 1, • • • , A")Namely, we define a 

partition of {1,..., A} into P disjoint subsets of M indices, Sp with p = I,... ,P, s.t. 

5i U 52 U...U5p = {!,..., A}, (6) 


where = 0 for all k,£ = 1,... ,P and k ^ i. Each subset, Sp = {jp,i,jp, 2 , ■ ■ ■ ,jp,M}, contains M indices, 

jp,m £ {!) • • ■) for m = 1,..., M and p = 1,... ,P. Following this strategy, the weights of the p-th mixture 
are computed as 

vrfxj) 7r(xj) 


Wi = 




i G Sp. 


(V) 


The resulting partial DM-MIS (p-DM-MIS) estimator is then given by 

1 ^ 

Vdm-mis = ^ -u;i/(xj), (8) 

Mj=i Wj 

which coincides with the expression in Q, but using the weights given by Eq. Q. Note that the particular cases 
P = 1 and P = N correspond to the full DM-MIS (f-DM-MIS) and the standard MIS (s-MIS) approaches, 
respectively. The number of evaluations of the proposal pdfs is then PM"^. Since A < = NM < A^, the 

computational cost is larger than that of s-MIS approach (M times larger), but lower than that of the f-DM-MIS 
approach (since M < N). 

The good performance of the novel approach is ensured by Theorem [T] and Corollary (see Appendix and 
can be summarized by the following expression: 


Var(/f_DM-Mis) < Var(ip_DM-Mis) < Var(is-Mis), (9) 

which holds regardless of the choice of P and the strategy followed to group the original proposals {qi(x)}^p 
into mixtures. Therefore, there is a tradeoff between performance and computational cost: using a smaller number 
of mixtures (P) leads to a reduced variance, but at the expense of an increase in the number of evaluations of the 
proposal densities. 


A. Choice of the number of mixtures P 

A simple strategy to choose the number of mixtures consists of starting with P = A (which coincides with 
the standard MIS scheme), computing the corresponding estimator in Eq. Q, and iteratively reducing the number 
of mixtures P (thus increasing M = ^) while the estimation significantly changes w.r.t. the previous step. This 
iterative approach does not require a significant additional computational cost (since the proposal evaluations can 
be stored and re-used) and results in an efficient and automatic procedure to select P. 

^Note that the number of evaluations of the target 7r(x) is the same regardless of whether the weights are calculated according to 0 or 

^For the sake of simplicity, we assume that all the mixtures contain the same number of proposal pdfs. However, any strategy that clusters 
the N proposals into P disjoint mixtures (regardless of their size) is valid. 
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Method 

Weight calculation 

Complexity 

Variance 

PMC □ 


Lowest (JT = N) 

Highest 

APIS (9) 

vTP) 

Medium (PT = JN) 

Medium 

p-DM-MIS 


Suitable (PM'^ = MN) 

Low 

AMIS m 

TAP) 

Highest (P = A^) 

Lowest 

f-DM-MIS 


Highest {PP 

Lowest 


TABLE I: Different strategies for weight calculation in AIS algorithms using J proposals per iteration (note that 
J = 1 in AMIS), and T iterations. 


B. Design of the P mixtures 

Developing an optimal strategy to cluster the proposals into P mixtures is a difficult task, since the number of 
possible configurations is extremely large. Indeed, unless this clustering strategy is computationally inexpensive, 
the additional computational effort might be better invested in decreasing P (thus increasing M = ^ and reducing 
the variance of the estimators). Therefore, we propose applying a simple random clustering strategy, where M = ^ 
different proposals are randomly assigned to each partition. This approach provides an excellent performance for 
large values of M (see Table 0 - so there seems to be little room for improvement (except maybe for small values 
of M). Note that this result is not surprising: randomness is the key element behind compressive sampling |[T4l . 
and many randomized clustering algorithms have been developed for applications such as data mining lITSl . image 
processing or blind channel identification IITtII . 

C. Application to AIS schemes 

For the sake of simplicity, we have focused on p-DM-MIS for a static framework, where the parameters of 
the proposals are fixed. However, all fhe previous considerations can be easily applied in AIS schemes, where 
fhe proposals are iferafively updated. In methods like PMC Q or APIS ||9l, a population of J proposal densities 
is adapted during T iterations. At the f-th iteration, the j-th sample is drawn from the j-th proposal, i.e., 

~ q^^\x) for J = 1,..., J and t = 1,..., T. Thus, after T iterations we have N = JT samples drawn from 
N = JT different proposal pdfs. 

Regardless of the adaptation procedure followed by each algorithm, different strategies can be used to design 
an efficient DM-MIS estimator when considering all the N = JT proposal pdfs. Table |I] summarizes the weight 
calculation for three well-known AIS methods (PMC, AMIS and APIS), comparing them to the DM-MIS estimators. 
We also analyze the complexity in terms of the number of proposal evaluations and the performance in terms of 
the variance reduction. We can see that the novel p-DM-MIS approach provides a very good compromise between 
performance and computational cost. Finally, it is important to remark that Table does not take into account the 
specific adaptive procedures of each algorithm, which can also have a large influence on the final performance. 

IV. Numerical example 

We consider a bivariate multimodal target pdf, defined as a mixfure of 5 Gaussians: 

1 

7 r(x) = - ^AA(x;r'i,Si), x € ( 10 ) 

i=l 

where ui = [-10,-10]^, 02 = [0,16]^, P 3 = [13,8]^, 1/4 = [-9,7]^, 05 = [14,-14]^, Si = [2, 0.6; 0.6, 1], 
Sia = [2, -0.4;-0.4, 2], S 3 = [2, 0.8; 0.8, 2], S 4 = [3, 0;0, 0.5] and S 5 = [2, -0.1;-0.1, 2]. The goal is to 
approximate, using some Monte Carlo method, the expected value of X ~ vr(x), i.e., E[X.] = xTT{x)dx and 
^ = /r 2 n{x)dx. 

We apply the MIS algorithm in a setup with N = 4096 Gaussian proposal pdfs, {Q'i(x) = A7(x;/Xj, Ci)}^i, 
where pti ~ (7([—20,20] x [—20,20]) is the randomly chosen location parameter and Cj = with cr = 5, 
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Parameters 

MSE(P[X]) 

MSE(^) 

Evaluations 

P = 4096 (M = 1): s-MIS 

6.8129 

0.0743 

4096 

P = 2048 (M = 2) 

3.3832 

0.0273 

8192 

P = 1024 (M = 4) 

1.4723 

0.0120 

16384 

P = 512 (M = 8) 

0.9678 

0.0076 

32768 

P = 256 (M = 16) 

0.8078 

0.0064 

65536 

P = 128 (M = 32) 

0.7730 

0.0060 

131072 

P = QA{M = 64) 

0.7648 

0.0058 

262144 

P = 32 (M = 128) 

0.7506 

0.0058 

524288 

P = 16 (M = 256) 

0.7486 

0.0058 

1048576 

P = 8iM = 512) 

0.7448 

0.0058 

2097152 

P^4:(M = 1024) 

0.7416 

0.0058 

4194304 

P = 2(M = 2048) 

0.7414 

0.0058 

8388608 

P = 1 (M = 4096): f-DM-MIS 

0.7406 

0.0058 

16777216 


TABLE II: MSE in the estimation of the mean and normalizing constant of the target for different values of P and 

M. 



Eig. 1: MSE in the estimation of the mean of the target w.r.t. the total number of proposal evaluations. 

is the scale matrix. We proceed as follows. Eirst, we draw a sample from each proposal. Then, we compute the 
corresponding weight according to ([T]). Einally, we huild the estimator ip_DM-Mis using ^ for different number of 
mixtures P = 2*^ for A: = 0,1,..., 12 (i.e., P = 1,2,..., 2048,4096). Since PM = N, the number of proposals 
per mixture is M = 2^^“^ = 4096, 2048,..., 2,1. Note that the case P = 1 (i.e., M = 4096) corresponds to 
the f-DM-MIS approach, while P = 4096 (i.e., M = 1) corresponds to the s-MIS. A random assignment of the 
proposals to the mixtures is performed in all cases. 

Table [n| shows the mean square error (MSE) in the estimation of the mean of the target P[X] (averaged over both 
dimensions) and the normalizing constant Z. All results are averaged over 500 independent runs. The last column 
of Table shows the total number of proposal evaluations for each value of M. Eigure [T] shows the MSE in the 
estimation of the mean of the target w.r.t. the total number of proposal evaluations. The results show that decreasing 
P reduces the MSE (as expected) at the expense of an increase in the computational cost (measured by the number 
of proposal evaluations). However, note that decreasing the number of mixtures below P = 64 does not improve 
the performance significantly. Indeed, the p-DM-MIS with P = 64 obtains an MSE close to that of the f-DM-MIS 
estimator while performing 98.4% less proposal evaluations, thus attaining an excellent performance-cost tradeoff. 

V. Conclusion 

In this paper we propose a novel approach for the calculation of the weights in multiple importance sampling 
schemes that provides an efficient tradeoff between computational complexity and variance reduction of the associ¬ 
ated estimator. The proposed scheme is based on constructing a random partition of the set of available proposals and 
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then calculating the weight of each sample locally using only the corresponding subset of the partition. Computer 
simulations reveal a very good performance of the method, which is able to attain an excellent performance- 
computational cost tradeoff. 


Appendix A 

Drawing samples erom a mixture of pdfs 
L et us consider a mixture of N normalized pdfs with equal weights, i.e., 

1 ^ 

V'(x) = — (11) 

i=l 

The classical procedure for drawing N samples from V'(x) is (starting with A: = 1): 

1) Draw y* G {1, ..., N} with equal weights 

2) Draw ~ 

3) Set k = k + 1 and repeat until k = N. 

In this way, each sample is distributed according to and, as a consequence, {xi, ..., x^v} ~ V’(x)- An 
alternative procedure, more deterministic than the previous one, consists of the following steps (starting with i = 1): 

1) Draw one sample from each gi(x), i.e., Xj ~ qi(x). 

2) Set i = i + 1 and repeat until i = N. 

In this case, we have Xj ~ (?i(x) for i = 1,..., A^, but the joint set is again distributed as V’(x), i.e., {xi, X 2 , ..., xtv} ~ 
V’(x). This is the underlying theoretical argument of the deterministic mixture (DM) weights approach used 
throughout this work. Furthermore, given M indices jr G {1,..., A^} with r = 1,..., M: 

{Xji, . . . , Xy„ } ~ (x) + . . . + (x). 


Appendix B 

Variance reduction of deterministic mixture weights 

Theorem 1. Consider a normalized target pdf, 7r(x) = ^7r(x), and N samples drawn from a set of N normalized 
proposal pdfs (one from each pdf), Xj ~ qi{x.) for i = 1,2,..., N. The standard multiple importance sampling 
(s-MIS) and the full deterministic mixture IS (f-DM-IS) estimators can be expressed as 

j _ 1 ^/(Xi)7r(x0 

J-S-MIS — 2_^ TTITX 5 U-^a) 

i=l 
N 


If-DM-MIS 


N 

1 


E- 


qiixi) 

/(xi)7r(xj) 




(12b) 


The variance of the f-DM-IS estimator is always lower or equal than the variance of the s-MIS estimator, 

Var{if.oM-is) < Var{is-Mis)- (13) 

Proof: See Appendix A of Q, IS. □ 

Corollary 2. For the standard MIS, full DM-MIS and partial DM-MIS estimators the following inequality holds: 

Var{If.DM-Mis) < Var{Ip.DM-Mis) < Var{Is.M[s)- 

Proof: For the first inequality, note that the full DM-MIS estimator of ( |12b| ) can also be expressed as 

f ^ /(xi)7r(xi) 

/f-DM-Mis pyP ,, y 

P=iie5p pl^p=i 


V’p(Xi) 


1 

M 


jSiSp 


where 
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is the proposal associated to the p-th mixture. Similarly, the partial DM-MIS estimator is given hy 

p 

I . 

-^p-DM-MIS 


p=l i&Sp 


f{Xi)TT{Xi) 

V’p(xi) 


Hence, applying Theorem with instead of qi{^i) and p V'p(xj) instead of ^ we have 

Var(/f_DM-Mis) < Var(ip_DM-Mis)- 

For the second inequality, following the same approach, the standard MIS estimator of (12a I can he rewritten as 

p 

f _ 1 /(xj)7r(xj) 

fs-MIS - ^ 2^ 2^ 


P=i ieSp 


Qii^i 


Applying Theorem^with V'p(xj) = js Y.v=i V’p(xi) instead of jj Y.f=i we have Var(ip_DM-Mis) < Var(/s-Mis)- 
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